Stochastic attributed K-d tree modeling of technical paper title pages
نویسندگان
چکیده
Structural information about a document is essential for structured query processing, indexing, and retrieval. A document page can be partitioned into a hierarchy of homogeneous regions such as columns, paragraphs, etc.; these regions are called physical components, and define the physical layout of the page. In this paper we develop a class of models for the physical layouts of technical paper title pages. We model physical layout using hidden semi-Markov models for directional projections of page regions, and a stochastic attributed K-d tree grammar model for the 2D hierarchical structure of these regions. We use the models to generate sets of synthetic title page images of three distinctive styles, which we use in controlled experiments on page structure analysis.
منابع مشابه
Stochastic k-Tree Grammar and Its Application in Biomolecular Structure Modeling
Stochastic context-free grammar (SCFG) has been successful in modeling biomolecular structures, typically RNA secondary structure, for statistical analysis and structure prediction. Context-free grammar rules specify parallel and nested co-occurren-ces of terminals, and thus are ideal for modeling nucleotide canonical base pairs that constitute the RNA secondary structure. Stochastic grammars h...
متن کاملConfidence Interval Estimation of the Mean of Stationary Stochastic Processes: a Comparison of Batch Means and Weighted Batch Means Approach (TECHNICAL NOTE)
Suppose that we have one run of n observations of a stochastic process by means of computer simulation and would like to construct a condifence interval for the steady-state mean of the process. Seeking for independent observations, so that the classical statistical methods could be applied, we can divide the n observations into k batches of length m (n= k.m) or alternatively, transform the cor...
متن کاملModeling with extended fault trees
In the areas of both safety and reliability analysis the precise modeling of complex technical systems during development and for evaluation purposes is of great importance. Traditionally, fault tree models have been used to accomplish this, and, more recently, stochastic Petri-net models have begun to be employed. To provide engineers with an intuitive high-level modeling interface to Petri-ne...
متن کاملAn efficient algorithm for finding the semi-obnoxious $(k,l)$-core of a tree
In this paper we study finding the $(k,l)$-core problem on a tree which the vertices have positive or negative weights. Let $T=(V,E)$ be a tree. The $(k,l)$-core of $T$ is a subtree with at most $k$ leaves and with a diameter of at most $l$ which the sum of the weighted distances from all vertices to this subtree is minimized. We show that, when the sum of the weights of vertices is negative, t...
متن کاملMulti-horizon stochastic programming
@article{KautEA12, author = {Michal Kaut and Kjetil T. Midthun and Adrian S. Werner and Asgeir Tomasgard and Lars Hellemo and Marte Fodstad}, title = {Multi-horizon stochastic programming}, journal = {Computational Management Science}, year = {2014}, volume = {11}, number = {1--2}, pages = {179--193}, note = {Special Issue: Computational Techniques in Management Science}, doi = {10.1007/s10287-...
متن کامل